devstral tool parser for tool calling #3851

dtrawins · 2025-12-09T23:53:44Z

🛠 Summary

CVS-177455
Added tool parser for devstral model.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

dkalinowski · 2025-12-10T13:04:34Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser() = delete;
+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


how are we ensured that [ARGS] / [TOOL_CALLS] are single tokens, treated as special, not as string, for example [AR and GS]?

Those are a special tokens.

that doesnt answer my question

devstral parser is setting requiresStreamingWithSpecialTokens() as true

dkalinowski · 2025-12-10T13:04:48Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
+        botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


validate if input_ids token count is == 1?

agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

dkalinowski · 2025-12-10T13:11:50Z

src/llm/io_processing/devstral/tool_parser.cpp

+    ToolCall toolCall;
+    std::string tool_name = tokenizer.decode(tool_name_tokens, ov::AnyMap{ov::genai::skip_special_tokens(true)});
+    if (this->toolSchemas.find(tool_name) == this->toolSchemas.end()) {
+        SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", tool_name);


this is behavior we havent implemented in other parsers, is it really worth return early? if we return function name that is not part of the toolschemas spec, we might be able to debug it in bfcl

This is not in line with current behavior in other parsers. I wouldn't do that check if it's only for this parser. Either drop it or create a task for alignment of other parsers.

dkalinowski · 2025-12-10T13:15:19Z

src/llm/io_processing/devstral/tool_parser.cpp

+            if (pos == 0) {
+                this->streamContent.clear();
+            } else {
+                this->streamContent = this->streamContent.substr(pos + 13);  // "[TOOLS_CALLS]" length is 13


we should avoid magic numbers, this way if we change this->streamingParsingToolCallsStartTag to another value, this part will be incorrect

you can look up how Adrian handles that in qwen coder

this->lastProcessedPosition = pos + Qwen3CoderToolParser::PARAMETER_END_TAG.length();

dkalinowski · 2025-12-10T13:16:25Z

src/llm/io_processing/devstral/tool_parser.cpp

+            ToolCall toolCall;
+            toolCall.arguments = arguments;
+            toolCall.name = this->toolName;
+            return sendFullDelta(toolCall);


shouldntg we stream partial function argument chunks? if i understand correctly you send full delta at the end of generation

We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

dkalinowski · 2025-12-10T13:18:10Z

src/llm/io_processing/output_parser.hpp

    bool requiresStreamingWithSpecialTokens() const {
-        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
+        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
               (toolParser && toolParser->requiresStreamingWithSpecialTokens());


@mzegla when i implemented it i remember your comment why it should really be && instead of ||, do you remember what was the reason?

This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) { return toolParser && toolParser->requiresStreamingWithSpecialTokens(); } else if (!toolParser) { return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens(); } else { return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && (toolParser && toolParser->requiresStreamingWithSpecialTokens()); }

dkalinowski · 2025-12-10T13:19:19Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+}
+
+TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithSingleToolCall_MissingEndTag) {
+    std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";


can you add test for scenarios with whitespace between the tags? i saw other models often put spaces or new lines before/after the function name

dkalinowski · 2025-12-10T13:23:01Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+        {"{\"", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        {"city\":", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        {" \"Paris", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        // Last chunk is added in the for loop below


missing test for enclosing scenario

Add test for handling special characters - ie sth like python script example:
https://github.com/openvinotoolkit/model_server/blob/main/src/test/llm/output_parsers/qwen3coder_output_parser_test.cpp#L748

Add test for empty args

mzegla · 2025-12-11T10:00:50Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
+        botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

mzegla · 2025-12-11T10:07:10Z

src/llm/io_processing/devstral/tool_parser.hpp

+    const int64_t argsTokenId;  // [ARGS]
+    const int64_t botTokenId;   // [TOOL_CALLS]
+
+    // in streaming mode we can rely on tags in string format as tokens are not available
+    const std::string streamingParsingArgsStartTag = "[ARGS]";
+    const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]";


Those tags/tokens are not specific to streaming, so I think we can drop streamingParsing prefix
Those are variables are about the same thing - please unify naming:
either botToken or toolCallsStart:
toolCallsStartTokenId, toolCallsStartTag or
botTokenId, botTag
and either argsTokenId or ArgsStartTag:
argsStartTokenId, argsStartTag or
argsTokenId, argsTag

So those variables are used only in streaming mode now, that's why such naming was selected, but I'm not sure about it since it's essentially just a to-string mapping of special tokens that are not limited to streaming.

@dkalinowski , @atobiszei - what's your opinion on this?

Sure, checking other parsers I see the name convention to be:

static const std::string parsingStartTag; static const std::string parsingStartTag2; static const std::string parsingEndTag;

const std::string parsingStartTag = "<|python_tag|>"; const std::string parsingEndTag = "";

so following our convention it should be

static const std::string parsingArgsStartTag; // = "[ARGS]"; <---- and implementation moved to cpp staitc const std::string parsingToolCallsStartTag; // = "[TOOL_CALLS]"; <--- and implementation moved to cpp

mzegla · 2025-12-11T10:14:44Z

src/llm/io_processing/output_parser.hpp

    bool requiresStreamingWithSpecialTokens() const {
-        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
+        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
               (toolParser && toolParser->requiresStreamingWithSpecialTokens());


This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) { return toolParser && toolParser->requiresStreamingWithSpecialTokens(); } else if (!toolParser) { return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens(); } else { return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && (toolParser && toolParser->requiresStreamingWithSpecialTokens()); }

mzegla · 2025-12-11T10:21:59Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    EXPECT_EQ(parsedOutput.reasoning, "");
+    ASSERT_EQ(parsedOutput.toolCalls.size(), 0);
+}
+


could you also add test for coding-like output? see:
https://github.com/openvinotoolkit/model_server/blob/main/src/test/llm/output_parsers/qwen3_output_parser_test.cpp#L311

mzegla · 2025-12-11T10:22:50Z

src/llm/io_processing/devstral/tool_parser.cpp

+
+void DevstralToolParser::parse(ParsedOutput& parsedOutput, const std::vector<int64_t>& generatedTokens) {
+    std::vector<std::string> tools;
+    // Parser will consume entire model output only if the first generated token is the beginning of tools token.


Does not look like this comment is true for this parser

mzegla · 2025-12-11T10:34:06Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (this->internalState == AWAITING_ARGS_TAG) {
+        // check if [ARGS] tag is present in the chunk and update state accordingly


Suggested change

// check if [ARGS] tag is present in the chunk and update state accordingly

// check if [ARGS] tag is present in the streamContent and update state accordingly

technically we check streamContent but it will be the case only if [ARGS] is added in the chunk. Otherwise it would be different state

Still, in the line below we check streamContent not a chunk

mzegla · 2025-12-11T10:36:04Z

src/llm/io_processing/devstral/tool_parser.cpp

+        if (pos != std::string::npos) {
+            this->internalState = PROCESSING_ARGS;
+            this->toolName = this->streamContent.substr(0, pos);
+            if (this->toolSchemas.find(this->toolName) == this->toolSchemas.end()) {


As for the unary part - this check is unique to this parser and I don't think it's a good idea to have different behavior for different parsers. Either remove or create a task for alignment of other parsers.

mzegla · 2025-12-11T10:36:39Z

src/llm/io_processing/devstral/tool_parser.cpp

+                SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", this->toolName);
+                return std::nullopt;
+            }
+            this->streamContent = this->streamContent.substr(pos + 6);  // "[ARGS]" length is 6


Magic number

mzegla · 2025-12-11T10:39:43Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (finishReason != ov::genai::GenerationFinishReason::NONE) {
+        size_t end_pos = this->streamContent.find("</s>");


What is this token? If it has some significant value for the parsing it should be a member of the parser class like args and tool calls token. Also:

Suggested change

size_t end_pos = this->streamContent.find("</s>");

size_t endPos = this->streamContent.find("</s>");

mzegla · 2025-12-11T10:42:55Z

src/llm/io_processing/devstral/tool_parser.cpp

+            ToolCall toolCall;
+            toolCall.arguments = arguments;
+            toolCall.name = this->toolName;
+            return sendFullDelta(toolCall);


We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

atobiszei · 2026-01-05T09:48:38Z

windows_prepare_llm_models.bat

  exit /b 1
 )

+if exist "%~1\%DEVSTRAL_MODEL%\%TOKENIZER_FILE%" (


I think this whole section in bat file could be replace with sth like:

:: =========================== :: Tokenizer-only model calls :: =========================== call :EnsureTokenizer "%~1" "%QWEN3_MODEL%" "%TOKENIZER_FILE%" "Qwen3" call :EnsureTokenizer "%~1" "%LLAMA3_MODEL%" "%TOKENIZER_FILE%" "Llama3.1" call :EnsureTokenizer "%~1" "%HERMES3_MODEL%" "%TOKENIZER_FILE%" "Hermes3" call :EnsureTokenizer "%~1" "%PHI4_MODEL%" "%TOKENIZER_FILE%" "Phi-4" call :EnsureTokenizer "%~1" "%MISTRAL_MODEL%" "%TOKENIZER_FILE%" "Mistral" call :EnsureTokenizer "%~1" "%GPTOSS_MODEL%" "%TOKENIZER_FILE%" "GPT-OSS" call :EnsureTokenizer "%~1" "%DEVSTRAL_MODEL%" "%TOKENIZER_FILE%" "Devstral" endlocal exit /b 0 :: ========================================================= :: Function: EnsureTokenizer :: %1 = base directory :: %2 = model name :: %3 = tokenizer file :: %4 = display name :: ========================================================= :EnsureTokenizer if exist "%~1\%~2\%~3" ( echo Models file %~1\%~2\%~3 exists. Skipping downloading models. ) else ( echo Downloading tokenizer and detokenizer for %~4 model to %~1\%~2 directory. mkdir "%~1\%~2" 2>nul convert_tokenizer "%~2" --with_detokenizer -o "%~1\%~2" if errorlevel 1 exit /b %errorlevel% ) if not exist "%~1\%~2\%~3" ( echo Models file %~1\%~2\%~3 does not exist. exit /b 1 ) exit /b 0

atobiszei · 2026-01-05T09:51:16Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+#include "../../../llm/io_processing/base_output_parser.hpp"
+#include "../../../llm/io_processing/output_parser.hpp"
+#include "../../platform_utils.hpp"


Suggested change

#include "../../../llm/io_processing/base_output_parser.hpp"

#include "../../../llm/io_processing/output_parser.hpp"

#include "../../platform_utils.hpp"

#include "src/llm/io_processing/base_output_parser.hpp"

#include "src/llm/io_processing/output_parser.hpp"

#include "src/platform_utils.hpp"

atobiszei · 2026-01-05T09:51:33Z

src/llm/io_processing/devstral/tool_parser.cpp

+
+#include "src/port/rapidjson_document.hpp"
+
+#include "../../../logging.hpp"


Suggested change

#include "../../../logging.hpp"

#include "src/logging.hpp"

atobiszei · 2026-01-05T09:51:42Z

src/llm/io_processing/devstral/tool_parser.cpp

+
+#include "../../../logging.hpp"
+#include "tool_parser.hpp"
+#include "../utils.hpp"


Suggested change

#include "../utils.hpp"

#include "src/utils.hpp"

atobiszei · 2026-01-05T09:56:54Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+        {"[ARGS]", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"id":"XXXXXXXXX","type":"function","index":0,"function":{"name":"get_weather"}}]}})"},
+        {"{\"", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]}})"},
+        {"city\":", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":"city\":"}}]}})"},
+        {" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris"}}]}})"},


Aren't we missing \ before " here?

Suggested change

{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris"}}]}})"},

{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris\"}}]}})"},

Copilot

Pull request overview

This PR adds support for the Devstral model's tool calling capabilities, implementing a custom tool parser that handles the model's specific format: [TOOL_CALLS]tool_name[ARGS]arguments.

Key changes:

Implements DevstralToolParser for parsing tool calls in both streaming and non-streaming modes
Adds DevstralGenerationConfigBuilder to configure tool-guided generation with Devstral-specific tags
Integrates Devstral parser into the output parsing and generation config pipelines
Adds comprehensive test coverage for various tool call scenarios

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
windows_prepare_llm_models.bat	Adds model download logic for Devstral tokenizer
prepare_llm_models.sh	Adds model download logic for Devstral tokenizer (Linux)
src/test/llm/output_parsers/devstral_output_parser_test.cpp	Comprehensive test suite for Devstral tool parsing scenarios
src/llm/servable.cpp	Adds error handling for API handler creation
src/llm/io_processing/output_parser.hpp	Fixes logic for checking streaming requirements when parsers are optional
src/llm/io_processing/output_parser.cpp	Registers Devstral tool parser
src/llm/io_processing/generation_config_builder.hpp	Registers Devstral generation config builder
src/llm/io_processing/devstral/tool_parser.hpp	Header for Devstral tool parsing implementation
src/llm/io_processing/devstral/tool_parser.cpp	Core implementation of Devstral tool call parsing logic
src/llm/io_processing/devstral/generation_config_builder.hpp	Header for Devstral generation configuration
src/llm/io_processing/devstral/generation_config_builder.cpp	Implementation of Devstral-specific generation configuration
src/llm/BUILD	Updates build configuration to include Devstral files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-07T09:53:30Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";
+    auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
+    std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
+    ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
+    EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Suggested change

std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";

auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;

std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());

ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);

EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");

std::string testInput = "Reasoning before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";

auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;

std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());

ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);

EXPECT_EQ(parsedOutput.content, "Reasoning before tool call ");

Copilot · 2026-01-07T09:53:31Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
+    std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
+    ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
+    EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot · 2026-01-07T09:53:31Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+}
+
+TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithContentAndSingleToolCall) {
+    std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}</s>";


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot · 2026-01-07T09:53:31Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
+    std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
+    ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
+    EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call ");


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot · 2026-01-07T09:53:31Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+}
+
+TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithInvalidOrder) {
+    std::string testInput = "Reasoninig before tool call [ARGS]example_tool[TOOL_CALLS]{\"arg1\":\"value1\",\"arg2\":42}</s>";


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot · 2026-01-07T09:53:32Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    auto generatedTensor = devstralTokenizer->encode(testInput, ov::genai::add_special_tokens(false)).input_ids;
+    std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
+    ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
+    EXPECT_EQ(parsedOutput.content, "Reasoninig before tool call example_tool{\"arg1\":\"value1\",\"arg2\":42}");


Corrected spelling of 'Reasoninig' to 'Reasoning'.

Copilot · 2026-01-07T09:53:32Z

src/llm/io_processing/devstral/generation_config_builder.hpp

+namespace ovms {
+
+/*
+ * Phi4GenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for Phi-4 model.


The comment incorrectly refers to Phi4GenerationConfigBuilder instead of DevstralGenerationConfigBuilder. Update the comment to accurately describe the Devstral class.

Suggested change

* Phi4GenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for Phi-4 model.

* DevstralGenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for the Devstral model.

mzegla

Please mention devstral in docs/llm/reference.md:
https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md#output-parsing-settings

mzegla · 2026-01-07T10:26:04Z

src/llm/io_processing/devstral/tool_parser.cpp

+    // now we need to add string toolCall.arguments to argumentsWrapper under "arguments" key
+
+    toolCallsString.SetString(toolCall.arguments.c_str(), wrappedDelta.GetAllocator());
+    functionObj.AddMember("arguments", toolCallsString, wrappedDelta.GetAllocator());
+    toolCallObj.AddMember("function", functionObj, wrappedDelta.GetAllocator());
+    toolCalls.PushBack(toolCallObj, wrappedDelta.GetAllocator());


Why do you mix function name and arguments in one delta? Other parsers follow the pattern where the first delta contains function name and other metadata and arguments are sent in the following deltas.

Looks like this particular case is not covered in other parsers - likely due to more verbose format where we don't see whole output provided as one chunk. In such case I'm more open to accepting that change even though it breaks "no arguments in first delta" rule that we followed so far.

mzegla · 2026-01-07T10:27:10Z

src/llm/io_processing/devstral/tool_parser.cpp

+    */
+
+    this->streamContent += chunk;
+    SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));


That's quite detailed information. Should it be on DEBUG not on TRACE?

mzegla · 2026-01-07T10:28:42Z

src/llm/io_processing/devstral/tool_parser.cpp

+        size_t pos = chunk.find(this->streamingParsingToolCallsStartTag);
+        if (pos != std::string::npos) {
+            this->internalState = AWAITING_ARGS_TAG;
+            std::cout << "Found [TOOL_CALLS] tag in chunk."


Remove or switch to use logger

mzegla · 2026-01-07T10:30:19Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (this->internalState == AWAITING_ARGS_TAG) {
+        // check if [ARGS] tag is present in the chunk and update state accordingly


Still, in the line below we check streamContent not a chunk

mzegla · 2026-01-07T10:30:59Z

src/llm/io_processing/devstral/tool_parser.hpp

+    const int64_t argsTokenId;  // [ARGS]
+    const int64_t botTokenId;   // [TOOL_CALLS]
+
+    // in streaming mode we can rely on tags in string format as tokens are not available
+    const std::string streamingParsingArgsStartTag = "[ARGS]";
+    const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]";


dkalinowski · 2026-01-07T13:34:08Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId([&tokenizer, this]() {
+            // can not use streamingParsingArgsStartTag because object is not initialized yet


it is not initialized because it is class field, which adds memory overhead, why cant this be static const field? it would be initialized early and it could be used here

To do that, simply declare fields as

static const std::string streamingEndTag;";

in header file

and then implement in cpp file:

const std::string DevstralParser::streamingEndTag = "</s>";

dkalinowski · 2026-01-07T13:40:08Z

src/llm/io_processing/devstral/tool_parser.hpp

+            if (encoded.get_shape()[0] != 1) {
+                throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");
+            }


Suggested change

if (encoded.get_shape()[0] != 1) {

throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");

}

if (encoded.get_shape().size() != 1) {

throw std::runtime_error("[TOOL_CALLS] token shape must have 1 dimension");

}

if (encoded.get_shape()[0] != 1) {

throw std::runtime_error("[TOOL_CALLS] must be a single token in the tokenizer vocabulary.");

}

dkalinowski · 2026-01-07T13:43:59Z

src/llm/io_processing/devstral/tool_parser.hpp

+        static const std::string toolCallEndTag = "</s>";
+        return toolCallEndTag;


Suggested change

static const std::string toolCallEndTag = "</s>";

return toolCallEndTag;

return streamingEndTag;

code duplication

dtrawins · 2026-01-07T13:52:02Z

src/llm/io_processing/devstral/tool_parser.cpp

+    */
+
+    this->streamContent += chunk;
+    SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));


Suggested change

SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));

SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));

dtrawins · 2026-01-07T13:52:19Z

src/llm/io_processing/devstral/tool_parser.cpp

+        size_t pos = chunk.find(this->streamingParsingToolCallsStartTag);
+        if (pos != std::string::npos) {
+            this->internalState = AWAITING_ARGS_TAG;
+            std::cout << "Found [TOOL_CALLS] tag in chunk."


Suggested change

std::cout << "Found [TOOL_CALLS] tag in chunk."

dtrawins · 2026-01-07T13:52:35Z

src/llm/io_processing/devstral/tool_parser.cpp

+        if (pos != std::string::npos) {
+            this->internalState = AWAITING_ARGS_TAG;
+            std::cout << "Found [TOOL_CALLS] tag in chunk."
+                      << " Current state: " << this->internalState << std::endl;


Suggested change

<< " Current state: " << this->internalState << std::endl;

dtrawins · 2026-01-07T13:53:07Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (this->internalState == AWAITING_ARGS_TAG) {
+        // check if [ARGS] tag is present in the chunk and update state accordingly


Suggested change

// check if [ARGS] tag is present in the chunk and update state accordingly

dtrawins added 2 commits December 10, 2025 00:53

devstral tool parser for tool calling

e3fb518

style

bf74839

dtrawins requested review from atobiszei, dkalinowski and mzegla December 10, 2025 10:05

dtrawins added 3 commits December 10, 2025 11:38

style

28cd83b

Merge remote-tracking branch 'origin/main' into devstral-parser

33a1062

get test tokenizer

104c980

dkalinowski reviewed Dec 10, 2025

View reviewed changes

mzegla reviewed Dec 11, 2025

View reviewed changes

dtrawins and others added 4 commits December 16, 2025 11:24

Merge remote-tracking branch 'origin/main' into devstral-parser

a150c4d

refactor

ccc71d3

Merge remote-tracking branch 'origin/main' into devstral-parser

6d09cea

refactor for streaming

9a55924

atobiszei reviewed Jan 5, 2026

View reviewed changes

mzegla requested a review from Copilot January 7, 2026 09:52

Copilot AI reviewed Jan 7, 2026

View reviewed changes

mzegla reviewed Jan 7, 2026

View reviewed changes

dkalinowski reviewed Jan 7, 2026

View reviewed changes

dtrawins commented Jan 7, 2026

View reviewed changes

dtrawins added 4 commits January 8, 2026 09:15

Merge remote-tracking branch 'origin/main' into devstral-parser

60cd106

review changes

c0609be

review changes

5e7f384

style

1452a55

	// check if [ARGS] tag is present in the chunk and update state accordingly
	// check if [ARGS] tag is present in the streamContent and update state accordingly

	size_t end_pos = this->streamContent.find("</s>");
	size_t endPos = this->streamContent.find("</s>");


		#include "src/port/rapidjson_document.hpp"

		#include "../../../logging.hpp"

	{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris"}}]}})"},
	{" \"Paris", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"Paris\"}}]}})"},

	* Phi4GenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for Phi-4 model.
	* DevstralGenerationConfigBuilder extends BaseGenerationConfigBuilder to provide specific configuration for the Devstral model.

		static const std::string toolCallEndTag = "</s>";
		return toolCallEndTag;

	static const std::string toolCallEndTag = "</s>";
	return toolCallEndTag;
	return streamingEndTag;

	SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));
	SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Chunk content: '{}', StreamContent: '{}', State: {}", chunk, this->streamContent, std::to_string(this->internalState));

devstral tool parser for tool calling #3851

Are you sure you want to change the base?

devstral tool parser for tool calling #3851

Conversation

dtrawins commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

🧪 Checklist

Uh oh!

dkalinowski Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dtrawins commented Dec 9, 2025 •

edited

Loading

dkalinowski Dec 10, 2025 •

edited

Loading